Los Angeles, the second-largest city in the United States, is known for Hollywood and beautiful weather. Crimes might not be the first thing that come to mind when you think of the city, but ultimately LA is no stranger to crime. I’m putting in a trigger warning (this can have distressing material) because data is no different than any other content collected or curated by humans - it isn’t always easy to digest. Trigger warning: Crime, Violence (Physical and Sexual)
I hope this project can shed light on how it is possible to get results quite quickly without pointing and clicking in Microsoft Excel. Please check the RMD file for code
#Name things well - don't use file names of datasets already in a preloaded package (i.e. iris).
LAcrimesdata <- read_csv("~/Documents/LACrime/Crime_Data_From_2010_to_Present.csv")
## Parsed with column specification:
## cols(
## .default = col_character(),
## `Crime Code` = col_integer(),
## `Victim Age` = col_integer(),
## `Premise Code` = col_integer(),
## `Weapon Used Code` = col_integer(),
## `Crime Code 1` = col_integer(),
## `Crime Code 2` = col_integer()
## )
## See spec(...) for full column specifications.
## [1] "DR Number" "Date Reported"
## [3] "Date Occurred" "Time Occurred"
## [5] "Area ID" "Area Name"
## [7] "Reporting District" "Crime Code"
## [9] "Crime Code Description" "MO Codes"
## [11] "Victim Age" "Victim Sex"
## [13] "Victim Descent" "Premise Code"
## [15] "Premise Description" "Weapon Used Code"
## [17] "Weapon Description" "Status Code"
## [19] "Status Description" "Crime Code 1"
## [21] "Crime Code 2" "Crime Code 3"
## [23] "Crime Code 4" "Address"
## [25] "Cross Street" "Location"
Above are the 26 variables available in the LA crimes data from https://data.lacity.org/A-Safe-City/Crime-Data-From-2010-to-Present/y8tr-7khq
## [1] 1570615
There are 1570615 crimes in this dataset.
## [1] "(33.9829, -118.3338)" "(34.0454, -118.3157)" "(33.942, -118.2717)"
## [4] "(33.9572, -118.2717)" "(34.2009, -118.6369)" "(34.0591, -118.2412)"
## [1] "33.9829" "34.0454" "33.942" "33.9572" "34.2009" "34.0591"
## [1] " -118.3338" " -118.3157" " -118.2717" " -118.2717" " -118.6369"
## [6] " -118.2412"
I separate the Location variable into Latitude and Longitude to make it easier to visualize later on through a map!
## term V1
## 1 DR Number 0
## 2 Date Reported 0
## 3 Date Occurred 0
## 4 Time Occurred 0
## 5 Area ID 0
## 6 Area Name 0
## 7 Reporting District 0
## 8 Crime Code 0
## 9 Status Description 0
## 10 Address 0
Here, I can see which variables have no missing values.
I wanted to test out whether crimes truly happen more at night. Round 1 of graphs show a lot of peaks, but ultimately this data might need a bird’s eye perspective first to get an overall picture.
This shows that the largest number of crimes is around 12 o’clock. Interesting, considering that people are told to avoid going out in the dark/at night? I decide to divide up day and night to see if there’s anything going on when I zoom out even further. Let’s mark day as between 0600 (inclusive) and 1800 (exclusive). Let’s mark night as between 1800 (inclusive) and 0600 (exclusive)
There are more crimes during the day than night! This disproves my initial thoughts (and conventional wisdom).
## Warning in validateCoords(lng, lat, funcName): Data contains 9 rows with
## either missing or invalid lat/lon values and will be ignored